From 3-d speaker cloning to text-to-audiovisual-speech

نویسندگان

  • Sascha Fagel
  • Frédéric Elisei
  • Gérard Bailly
چکیده

Visible speech movements were motion captured and parameterized. Coarticulated targets were extracted from VCVs and modeled to generate arbitrary German utterances by target interpolation. The system was extended to synthesize English utterances by a mapping to German phonemes. An evaluation by means of a modified rhyme test reveals that the synthetic videos of isolated words increase the recognition scores from 27 % to 47.5 % when added to audio only presentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

German text-to-audiovisual-speech by 3-d speaker cloning

Visible speech movements were optically motion captured and parameterized by means of a guided PCA. Co-articulated consonantal targets were extracted from VCVs, vocalic targets were extracted from these VCVs and from sustained vowels. Targets were selected or combined to derive target sequences for phone chains of arbitrary German utterances. Parameter trajectories for these utterances are gene...

متن کامل

A text-to-audiovisual-speech synthesizer for French

An audiovisual speech synthesizer from unlimited French text is here presented. It uses a 3-D parametric model of the face. The facial model is controlled by eight parameters. Target values have been assigned to the parameters, for each French viseme, based upon measurements made on a human speaker. Parameter trajectories are modeled by means of dominance functions associated with each paramete...

متن کامل

Some Experiments in Audio-Visual Speech Processing

Natural speech is produced by the vocal organs of a particular talker. The acoustic features of the speech signal must therefore be correlated with the movements of the articulators (lips, jaw, tongue, velum,...). For instance, hearing impaired people (and not only them) improve their understanding of speech by lip reading. This chapter is an overview of audiovisual speech processing with empha...

متن کامل

Intelligibility of natural and 3d-cloned German speech

We investigate the intelligibility of natural visual and audiovisual speech compared to re-synthesized speech movements rendered by a talking head. This talking head is created using the speaker cloning methodology of the Institut de la Communication Parlée in Grenoble (now department for speech and cognition in GIPSA-Lab). A German speaker with colored markers on the face was recorded audiovis...

متن کامل

Audiovisual speech synthesis: An overview of the state-of-the-art

We live in a world where there are countless interactions with computer systems in every-day situations. In the most ideal case, this interaction feels as familiar and as natural as the communication we experience with other humans. To this end, an ideal means of communication between a user and a computer system consists of audiovisual speech signals. Audiovisual text-to-speech technology allo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008